Meta-prediction of coronary artery disease risk

Shang-Fu Chen, Sang Eun Lee, Hossein Javedani Sadaei, Jun-Bean Park, Ahmed Khattab, Jei-Fu Chen, Corneliu Henegar, Nathan E. Wineinger, Evan D. Muse, Ali Torkamani
nature medicine
Scripps Research Translational Institute

Table of Contents

Overall Summary

Study Background and Main Findings

This study addresses the critical public health challenge of Coronary Artery Disease (CAD), a leading cause of global morbidity and mortality, by aiming to improve upon current risk prediction methods. Existing tools often show limited accuracy, particularly in diverse populations or specific subgroups, and struggle to effectively integrate the growing wealth of genetic information. The primary objective was to develop and validate a novel machine learning framework – termed 'meta-prediction' – that integrates a wide range of unmodifiable risk factors (like age and numerous genetic predispositions summarized by Polygenic Risk Scores, or PRSs) with modifiable factors (clinical measurements, lifestyle information) to generate more accurate, personalized, and actionable 10-year CAD risk estimates.

Methodologically, the researchers utilized large-scale data from the UK Biobank (UKBB), strategically dividing participants into two groups: one with existing CAD ('prevalent cohort') and one initially free of CAD who were followed over time ('incident cohort'). The core innovation involved a two-stage process: first, training numerous baseline predictive models on the prevalent cohort to estimate various risk factors and diagnoses; second, using the outputs (predictions) from these baseline models as new input features – 'meta-features' – along with directly measured data, to train a final ensemble machine learning model (specifically XGBoost) predicting the 10-year risk of developing CAD in the incident cohort. This hierarchical approach allowed the model to learn complex patterns and interactions from over 1,700 initial features, ultimately selecting the 50 most informative ones, including 15 meta-features and 22 PRSs.

The resulting meta-prediction model demonstrated significantly improved performance. Within the UKBB test set, it achieved high discrimination (Area Under the Curve, AUC, a measure of model accuracy where 1 is perfect and 0.5 is random chance, was 0.84). Crucially, this high performance was largely maintained (AUC 0.81) upon external validation in the independent and diverse All of Us (AoU) research program cohort, indicating good generalizability. The model substantially outperformed standard clinical risk scores like the Pooled Cohort Equations (PCE) and QRISK3 (average AUC improvement >10%, average Area Under the Precision-Recall Curve improvement 67%), and also showed better risk reclassification (Net Reclassification Index 0.14-0.21). Performance gains were particularly notable in subgroups traditionally considered low-risk. Furthermore, model interpretation using SHAP values identified key risk drivers, and simulations suggested the framework could potentially guide personalized interventions by predicting differential risk reduction based on an individual's genetic profile and risk subgroup.

In conclusion, the study presents a powerful, integrative meta-prediction framework that significantly advances CAD risk prediction accuracy and generalizability compared to current standards. By effectively leveraging comprehensive genetic and non-genetic data through a sophisticated machine learning approach, the framework offers potential for more precise risk stratification and personalized prevention strategies. The findings underscore the importance of incorporating broad genetic information and complex interactions to capture individual CAD susceptibility more effectively, paving the way towards precision cardiology.

Research Impact and Future Directions

This study successfully developed and validated a sophisticated machine learning framework for predicting 10-year Coronary Artery Disease (CAD) risk, demonstrating superior performance compared to established clinical scores and previous research models. The 'meta-prediction' approach, integrating a vast array of genetic (numerous Polygenic Risk Scores - PRSs) and non-genetic factors (clinical, lifestyle, biomarker data) through intermediate predictive steps ('meta-features'), represents a significant methodological advance.

The model's high accuracy (AUC 0.84 in UK Biobank, 0.81 in the diverse All of Us cohort) and improved ability to reclassify individuals into more appropriate risk categories (Net Reclassification Index 0.14-0.21 vs standard scores) underscore its potential clinical utility. Particularly noteworthy is its enhanced performance in subgroups often considered lower risk by traditional methods (e.g., younger individuals, females), suggesting it captures risk pathways missed by simpler models. The framework's ability to simulate differential intervention benefits based on genetic risk profiles offers a promising avenue towards personalized prevention strategies, although these simulations are currently model-based hypotheses requiring prospective clinical validation.

While the findings are compelling, it is crucial to recognize the study's observational nature; the model identifies complex associations, but direct causation for all contributing factors is not established by this work alone. Although validated in the diverse All of Us cohort, further assessment in specific underrepresented populations and age groups is warranted. The practical implementation of such a complex model faces hurdles beyond predictive accuracy, including seamless integration into clinical workflows, clinician training and acceptance, regulatory approval, and demonstration of cost-effectiveness. The infrastructure for such deployment is emerging but not yet widespread.

In essence, this research provides a powerful demonstration of how advanced machine learning, applied to large-scale, multi-modal data including comprehensive genetics, can significantly refine cardiovascular risk prediction. It marks a substantial step towards more personalized and precise prevention of CAD. However, translating this potential into routine clinical impact requires overcoming implementation challenges and further validating the personalized intervention aspect through dedicated clinical trials.

Critical Analysis and Recommendations

Highlights Key Performance Metrics and Superiority (written-content)
Observation: The abstract highlights key quantitative performance metrics (AUC 0.84 in UK Biobank, 0.81 in All of Us) and the model's superiority over existing methods. Methodological Context: These metrics result from testing the final meta-prediction model on hold-out internal and external validation cohorts. Clinical/Practical Significance: Demonstrates the model's high discriminative accuracy and potential impact compared to current standards. Implementation Implications: Strong performance metrics are crucial for justifying further development and potential clinical adoption.
Section: Abstract
Emphasizes Actionability and Personalization (written-content)
Observation: The abstract mentions the framework's ability to generate individualized risk reduction profiles and notes that genetic risk influences intervention benefits. Methodological Context: This stems from model simulations where risk factors are perturbed. Clinical/Practical Significance: Suggests potential clinical utility beyond risk stratification, enabling tailored prevention strategies. Implementation Implications: Points towards a future application where the model could guide personalized treatment decisions, pending further validation.
Section: Abstract
Explicitly State Study Design (written-content)
Issue: The abstract does not explicitly state the study design (observational cohort study). Impact: While inferable, explicitly stating the design provides immediate context about the nature of the evidence (associations derived from observation) for readers scanning the abstract, enhancing clarity without requiring further reading.
Section: Abstract
Critiques Prior Integration Strategies (written-content)
Observation: The introduction effectively reviews prior attempts to integrate genetic and clinical data, noting their limitations (e.g., marginal improvements from linear combinations, failure to capture interactions). Methodological Context: This review establishes the scientific gap the current study aims to fill. Clinical/Practical Significance: Justifies the need for the novel, more complex meta-prediction framework proposed in the paper. Implementation Implications: Underscores why simpler approaches may be insufficient for achieving personalized risk prediction.
Section: Introduction
Clearly Introduces the Novel Framework (written-content)
Observation: The introduction clearly articulates the proposed 'omnigenic, integrative, meta-prediction framework' and its differentiating features (numerous PRSs, ML for interactions, meta-feature integration). Methodological Context: This defines the study's unique methodological contribution. Clinical/Practical Significance: Sets expectations for how the framework aims to improve upon previous work. Implementation Implications: Provides a conceptual blueprint of the novel approach.
Section: Introduction
Explicitly State Central Hypothesis (written-content)
Issue: The introduction outlines the framework and goals but lacks an explicit, concise statement of the central hypothesis. Impact: Adding a clear hypothesis (e.g., that the framework will achieve superior accuracy and stratification by capturing complex interactions) would provide a sharper focus and directly link the proposed methods to the expected advantages.
Section: Introduction
Rigorous Comparative Evaluation Shows Superior Performance (written-content)
Observation: The meta-prediction model significantly outperformed established clinical scores (PCE, QRISK3, PREVENT) and research models across multiple metrics (e.g., AUROC >10% higher, AUPRC 67% higher on average). Methodological Context: Comparisons were made on a hold-out test set using standard evaluation metrics. Clinical/Practical Significance: Provides strong quantitative evidence for the model's superior predictive ability compared to current standards. Implementation Implications: Supports the potential for the model to replace or augment existing clinical risk assessment tools.
Section: Results
Strong External Validation and Generalizability (written-content)
Observation: The model maintained high performance (AUC 0.81) when validated externally in the diverse All of Us (AoU) cohort, with similar performance across European, African, and Hispanic ancestry groups. Methodological Context: A streamlined model was tested on the independent AoU dataset. Clinical/Practical Significance: Demonstrates the model's robustness and generalizability beyond the UKBB training data, suggesting broader applicability. Implementation Implications: Increases confidence in the model's potential utility across different populations, although further validation in specific groups is still beneficial.
Section: Results
Demonstrates Potential for Personalized Intervention Guidance (written-content)
Observation: Simulations showed that the predicted absolute risk reduction from interventions (lowering LDL, HbA1c, SBP) varied significantly based on individuals' underlying genetic risk and their assigned risk subgroup. Methodological Context: Intervention effects were simulated by altering feature values in the trained model and observing changes in predicted risk. Clinical/Practical Significance: Highlights the framework's potential to guide personalized prevention by identifying individuals likely to derive the most benefit from specific interventions. Implementation Implications: Suggests a future clinical application for tailoring treatment intensity, contingent on prospective validation.
Section: Results
Enhance Clinical Interpretation of Risk Subgroups (written-content)
Issue: While distinct risk subgroups were identified via SHAP value clustering, the Results section primarily describes their differentiating features statistically without offering much clinical interpretation. Impact: Briefly connecting the differentiating features (e.g., high CAD PRS dominance vs. other factors) to potential underlying clinical or pathophysiological profiles within the Results would make the significance of these subgroups more immediately apparent and tangible.
Section: Results
Clear Summary of Core Findings and Superiority (written-content)
Observation: The Discussion effectively synthesizes the main finding – the meta-prediction framework's superior and more generalizable performance compared to existing standards. Methodological Context: This summarizes the key results from comparative analyses. Clinical/Practical Significance: Clearly states the paper's primary contribution and its potential impact on CAD risk prediction. Implementation Implications: Provides a concise take-home message regarding the framework's advantages.
Section: Discussion
Emphasizes Importance of Genetic Risk and Meta-Features (written-content)
Observation: The Discussion emphasizes the crucial role of genetic risk, particularly via meta-features derived from unmodifiable factors (genetics, age, sex), in achieving superior prediction. Methodological Context: This interpretation is based on SHAP value analysis showing the high importance of these features. Clinical/Practical Significance: Reinforces the study's theme about the power of integrating genetics and highlights potential mechanisms driving the model's success. Implementation Implications: Underscores the value of incorporating comprehensive genetic data in risk models.
Section: Discussion
Elaborate on Mechanisms of Differential Intervention Benefit (written-content)
Issue: The Discussion notes that genetic risk mediates differential intervention benefits but doesn't explore potential mechanisms. Impact: Speculating on how genetics might influence intervention response (e.g., pharmacogenetics, baseline risk levels, pathway interactions) would add depth, moving beyond observation towards biological/clinical implications and stimulating further research into personalized prevention mechanisms.
Section: Discussion
Explicit Description of Meta-Prediction Framework (written-content)
Observation: The Methods explicitly detail the novel meta-prediction strategy, explaining the two-stage process involving baseline models (trained on prevalent cohort) generating meta-features for the final incident model. Methodological Context: This describes the core innovation of the study. Clinical/Practical Significance: Allows understanding of how diverse information streams are integrated. Implementation Implications: Provides the blueprint necessary for others to understand and potentially replicate or adapt the approach.
Section: Methods
Rigorous External Validation Strategy (written-content)
Observation: The Methods detail the external validation strategy using the All of Us (AoU) cohort, including feature mapping, handling data differences, and testing specific streamlined/generalizable models. Methodological Context: This describes the steps taken to assess model robustness in an independent, diverse population. Clinical/Practical Significance: Significantly strengthens the study's claims of generalizability and potential broad applicability. Implementation Implications: Provides evidence supporting the model's potential use beyond the initial development cohort.
Section: Methods
Specify Clustering Cut Height or Cluster Number Determination Method (written-content)
Issue: The Methods state that subgroup identification used hierarchical clustering with fixed-height tree cutting to define five clusters, but the specific cut height or the method used to determine 'five' as the optimal number is not provided. Impact: This missing parameter hinders the exact reproducibility of the subgroup analysis, a key component supporting the personalized intervention findings.
Section: Methods

Section Analysis

Abstract

Key Aspects

Strengths

Suggestions for Improvement

Introduction

Key Aspects

Strengths

Suggestions for Improvement

Results

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Fig. 1| Overview of cohort construction, model development and performance...
Full Caption

Fig. 1| Overview of cohort construction, model development and performance assessment for 10-year incident CAD risk meta-prediction in the UKBB.

Figure/Table Image (Page 3)
Fig. 1| Overview of cohort construction, model development and performance assessment for 10-year incident CAD risk meta-prediction in the UKBB.
First Reference in Text
For incident CAD prediction, we assembled a cohort of 160,159 participants, with 9.02% being diagnosed with CAD after baseline over the 10-year follow-up period (Fig. 1a).
Description
  • Study Population Origin: This part of the figure (Panel a) visually outlines how the researchers selected the groups of people for their study from a larger dataset called the UK Biobank (UKBB). It starts with an initial pool of 502,414 individuals.
  • Exclusion Process: The diagram shows that certain individuals were removed (excluded) based on specific criteria. For example, 15,207 people were excluded because they lacked necessary data (like age, sex, or genetic information) or withdrew consent. Another large group of 147,540 was excluded due to issues with their electronic health records (EHR), such as insufficient follow-up time or number of entries, or developing the disease after the study's 10-year timeframe.
  • Incident CAD Cohort Derivation: After exclusions, the remaining 487,207 individuals with genotype data were divided. One key group formed is the 'incident CAD cohort', consisting of 160,159 participants. This group includes individuals who did not have Coronary Artery Disease (CAD - a condition affecting the heart's blood supply) at the start but were monitored for 10 years. The text specifically mentions this cohort size.
  • Incident CAD Cohort Composition: Within this incident CAD cohort of 160,159 people, the figure and text state that 14,445 individuals (which corresponds to 9.02% mentioned in the text, though the exact percentage isn't in the figure panel itself) developed CAD during the 10-year follow-up period. The remaining 145,714 served as controls (people who did not develop CAD during that time).
  • Prevalent CAD Cohort Derivation: Panel 'a' also depicts the derivation of a 'prevalent CAD cohort' (179,508 participants), which includes individuals who already had a CAD diagnosis at the beginning of the study (16,190 cases vs 163,318 controls). This cohort is used for different analyses, like training initial models.
Scientific Validity
  • Systematic Cohort Definition: The systematic approach to cohort definition and participant exclusion based on predefined criteria (genotype availability, EHR data sufficiency) is appropriate for establishing well-defined prevalent and incident cohorts for prediction modeling.
  • EHR Data Filtering Criteria: The criteria for EHR data sufficiency (minimum entries, follow-up duration) are explicitly stated (though briefly in the figure, likely detailed in Methods), which adds rigor to the control group definition and minimizes bias from incomplete follow-up.
  • Methodological Separation of Cohorts: The clear separation of prevalent (existing disease) and incident (new disease) cohorts is methodologically sound, as they serve distinct purposes in model training (baseline prediction vs. prospective prediction).
  • Statistical Power: The large sample sizes derived (N=160,159 for incident cohort) provide substantial statistical power for the subsequent prediction model development and validation.
  • Plausibility of Event Rate: The number of incident cases (14,445) relative to the total incident cohort size (160,159) yields an event rate (9.02%) that appears plausible for a 10-year follow-up in this age group, although comparison to UKBB overall rates would be informative.
Communication
  • Flowchart clarity: The flowchart format effectively illustrates the sequential filtering process applied to the UK Biobank population to derive the study cohorts. The visual flow from the initial large number to the final prevalent and incident cohorts is intuitive.
  • Numerical transparency: The specific numbers for exclusions and final cohort sizes are clearly presented at each stage, enhancing transparency and allowing readers to understand the scale of data filtration.
  • Cohort distinction: The distinction between the prevalent and incident cohorts is clearly demarcated, which is crucial for understanding the subsequent modeling approaches.
  • Exclusion criteria detail: While the main flow is clear, the reasons for exclusion could be slightly more detailed directly within the flowchart, although it's acknowledged that space is limited and details are likely in the Methods section.
Table 1 | Reclassification analysis in the UKBB population
Figure/Table Image (Page 4)
Table 1 | Reclassification analysis in the UKBB population
First Reference in Text
Additional demographic and clinical characteristics of the prevalent and incident risk cohorts are available in Table 1.
Description
  • Purpose of the Table: This table compares a new prediction model ('Meta-prediction') against three existing clinical risk scores (PCE, QRISK3, PREVENT) used to estimate the risk of developing Coronary Artery Disease (CAD).
  • Method and Population: It uses a method called 'reclassification analysis' performed on a specific group of people from the UK Biobank study (the test set of the incident CAD cohort, involving 2,889 people who developed CAD ('Events') and 29,143 who did not ('No event') over the study period).
  • Reclassification Concept: Reclassification analysis checks how many individuals are assigned to different risk categories (e.g., low, borderline, intermediate, high risk) by the new model compared to the old scores. The table shows the percentage of individuals moving between these categories.
  • Example Reclassification Values: For each comparison (Meta-prediction vs. PCE, vs. QRISK3, vs. PREVENT), the table shows percentages of individuals reclassified. For instance, when comparing against PCE, among those who did not have an event, the meta-prediction model moved 23.88% of individuals from the PCE intermediate-risk category (7.5-20%) to its own low-risk category (<5%). Conversely, among those who did have an event, it moved 17.03% from the PCE intermediate-risk category to its own intermediate-risk category.
  • Net Reclassification Index (NRI): The table calculates a summary statistic called the Net Reclassification Index (NRI). This index quantifies the overall improvement in classification, considering both correct upward movement for cases (events) and correct downward movement for controls (no events), while penalizing incorrect movements. A positive NRI suggests the new model is better at classifying people.
  • Key NRI Results: The calculated NRI values indicate the meta-prediction model achieved an improvement over the existing scores: NRI = 0.20 compared to PCE, NRI = 0.14 compared to QRISK3, and NRI = 0.21 compared to PREVENT.
Scientific Validity
  • Appropriateness of Reclassification Analysis: Reclassification analysis is a standard and appropriate method for evaluating the incremental value of a new prediction model or biomarker beyond existing ones, particularly concerning changes in clinical risk categories.
  • Clinical Relevance of Thresholds: The use of clinically relevant risk thresholds (e.g., 7.5% for PCE/PREVENT, 10% for QRISK3) for defining risk categories makes the analysis directly applicable to potential clinical decision-making scenarios.
  • Validity of NRI Calculation: Calculating the Net Reclassification Index (NRI) provides a quantitative summary of the improvement, complementing visual inspection of the reclassification tables. The calculation appears correctly applied based on the standard formula (sum of net improvements for events and non-events).
  • Use of Hold-Out Test Set: The analysis is performed on a hold-out test set (n=32,032 total, split into events and no-events), ensuring the evaluation is independent of the model training data, which is crucial for assessing generalizability.
  • Consistency of Findings: The results consistently show positive NRI values across comparisons with three different standard scores (PCE, QRISK3, PREVENT), strengthening the conclusion that the meta-prediction model offers improved risk stratification.
  • Consideration of Metric Limitations: While NRI is informative, it has known limitations (e.g., dependence on the number and choice of categories). The authors supplement this with other metrics (AUROC, AUPRC in Fig 1e,f and continuous NRI/IDI in Supplementary Table 8), providing a more comprehensive evaluation.
Communication
  • Table Structure: The table structure effectively presents the cross-tabulation required for reclassification analysis, clearly showing movement between risk categories for both events (cases) and no-events (controls).
  • Use of Standard Thresholds: Using standard clinical risk thresholds (e.g., <5%, 5-7.5%, 7.5-20%, ≥20% for PCE and PREVENT; <10%, 10-20%, ≥20% for QRISK3) facilitates comparison with established clinical practice.
  • Symbol Clarity: The use of plus (+) and minus (-) signs to indicate favorable and unfavorable reclassification, respectively, is intuitive, although a brief note defining this in the legend would be beneficial.
  • Data Presentation (Percentages): Presenting percentages within each cell allows for easy interpretation of the magnitude of reclassification within each subgroup.
  • NRI Presentation: The final Net Reclassification Index (NRI) is clearly presented and highlighted (bolded) for each comparison, summarizing the overall improvement.
  • Labeling Clarity: The column headers clearly distinguish between 'No event' and 'Event' populations, and the row/column labels specify the risk categories for both the meta-prediction model and the comparator score.
  • Sample Size Information: The sample sizes for events (n=2,889) and no-events (n=29,143) are provided, adding context to the percentages shown.
Fig. 2 | Comparative performance of meta-prediction stratified by standard risk...
Full Caption

Fig. 2 | Comparative performance of meta-prediction stratified by standard risk factors in the UKBB population.

Figure/Table Image (Page 5)
Fig. 2 | Comparative performance of meta-prediction stratified by standard risk factors in the UKBB population.
First Reference in Text
The meta-prediction approach resulted in superior performance across all strata explored, with an average 2.2-fold improvement in CAD event enrichment in the top percentile of CAD risk, an average 1.2-fold improvement in AUROC and an average 2-fold improvement in AUPRC per stratum, as observed for the overall cohort (Fig. 2 and Supplementary Table 9).
Description
  • Purpose of Comparison: This figure compares the performance of the study's new 'Meta-prediction' model against three existing clinical risk scores (PCE, QRISK3, PREVENT) for predicting Coronary Artery Disease (CAD).
  • Stratification by Risk Factors: The comparison is not just overall, but is broken down ('stratified') across different subgroups of the UK Biobank population. These subgroups are defined by various standard risk factors, such as age categories (<55 vs ≥55 years), sex (Male vs Female), levels of specific biological measurements (like LDL cholesterol, HbA1c - a measure of blood sugar control, SBP - systolic blood pressure), body measurements (like WHR - waist-hip ratio, BMI - body mass index), and genetic predisposition scores (Polygenic Risk Scores or PRS for CAD, Type 2 Diabetes, LDL, etc.).
  • Performance Metrics: Performance is evaluated using three different measures shown in three stacked panels: 1) 'Incident CAD in top percentile (%)' shows how concentrated the actual CAD cases are within the group predicted to be at highest risk (top 1%); 2) 'AUROC' (Area Under the Receiver Operating Characteristic curve) measures the model's ability to distinguish between people who will get CAD and those who won't (higher is better, 1.0 is perfect, 0.5 is random chance); 3) 'AUPRC' (Area Under the Precision-Recall curve) is another measure of predictive accuracy, particularly useful when the event (like CAD) is relatively rare (higher is better).
  • Visual Representation: Visually, for each risk factor stratum (e.g., 'Age <55 years'), there are bars representing the performance of each model (Meta-prediction in blue, others in different colors) according to the three metrics. The figure generally shows that the blue bar (Meta-prediction) is higher than the others across most strata and metrics.
  • Overall Performance Trend: The text mentions, and the figure visually supports, that the meta-prediction model shows superior performance on average across these different groups, with about a 1.2-fold improvement in AUROC and a 2-fold improvement in AUPRC compared to the other scores.
  • Highlighted Improvements: Annotations at the bottom highlight specific factors (Age, Sex, CAD-PRS) where the meta-prediction model showed particularly strong relative improvement in AUPRC, indicated by numerical fold changes (e.g., 1.2x for Age, 2.0x for Sex).
Scientific Validity
  • Importance of Stratified Analysis: Stratifying the performance analysis by known risk factors is a crucial step to assess model robustness and identify potential subgroup-specific performance variations. This goes beyond overall performance metrics and evaluates fairness and applicability across different demographic and clinical profiles.
  • Relevance of Stratification Factors: The selection of stratification factors covers a relevant range of demographic (age, sex), clinical (PCE, QRISK3, PREVENT scores), biomarker (LDL, HbA1c, SBP), anthropometric (WHR, BMI), and genetic (multiple PRSs) variables pertinent to CAD risk.
  • Use of Multiple Performance Metrics: Evaluating performance using multiple metrics (enrichment, AUROC, AUPRC) provides a more comprehensive picture than relying on a single metric. AUPRC is particularly important given the likely class imbalance in incident CAD prediction.
  • Consistency of Results Across Strata: The consistent outperformance of the meta-prediction model across nearly all strata and metrics strongly supports the claim of its superiority and robustness compared to standard clinical scores and PRS-only approaches implicitly compared via stratification.
  • Identification of High-Impact Subgroups: The analysis highlights subgroups where the meta-prediction model offers particularly large gains (e.g., low-risk strata according to traditional scores, females, low WHR/HbA1c/SBP), suggesting it effectively captures risk missed by conventional methods, which aligns with the authors' interpretation.
  • Statistical Significance within Strata: While the figure shows clear trends, the statistical significance of the performance differences within each stratum is not directly presented in the figure itself (though likely available in supplementary materials like the referenced Table 9), which would further strengthen the conclusions.
Communication
  • Multi-metric presentation: The three-tiered structure, presenting enrichment, AUROC, and AUPRC, allows for a multi-faceted comparison of model performance within a single figure.
  • Comprehensive stratification: Stratifying by a wide range of clinical and genetic risk factors (PCE, age, sex, multiple PRSs, biomarkers like LDL, HbA1c, SBP, anthropometrics like WHR, BMI) effectively demonstrates the model's robustness across diverse subgroups.
  • Consistent color-coding: The consistent color-coding for the different prediction models (Meta-prediction, PCE, QRISK3, PREVENT) across all strata facilitates easy visual comparison.
  • High/Low Strata Comparison: Including both high and low strata for each risk factor allows direct comparison of performance differences within subgroups (e.g., performance in low vs. high LDL groups).
  • Clear labeling of strata: The x-axis labels clearly identify the stratification factor and the specific PRS used (e.g., 'CAD-PRS (PGS003356)', 'LDL PRS (PGS000892)').
  • Annotations and summary plot: The annotations showing the average fold change in AUPRC for specific factors (Age, Sex, CAD-PRS) and the bubble plot summarizing relative AUPRC differences provide a useful summary, highlighting key findings directly on the figure, although the bubble plot's interpretation requires careful reading of the caption/text.
  • Information density vs. clarity: While visually dense due to the amount of information, the layout is logical and allows for systematic comparison across factors and metrics.
Fig. 3 | SHAP summary plot of features in the meta-prediction framework in the...
Full Caption

Fig. 3 | SHAP summary plot of features in the meta-prediction framework in the UKBB population.

Figure/Table Image (Page 7)
Fig. 3 | SHAP summary plot of features in the meta-prediction framework in the UKBB population.
First Reference in Text
Example SHAP plots for non-CAD or non-CAD component meta-features are provided on the right side of Fig. 3.
Description
  • Purpose and Examples: This part of Figure 3 provides detailed examples of how specific 'meta-features' (predictions generated by intermediate models) that are not directly related to Coronary Artery Disease (CAD) contribute to the overall CAD risk prediction. The examples shown are predictions for '20-year mental illness', 'Any-onset nonischemic cardiomyopathy (CMP - a disease of the heart muscle)', and 'Present sleep duration'.
  • SHAP Plot Explanation: Each example uses a SHAP (SHapley Additive exPlanations) plot. SHAP values quantify the impact of each input feature (e.g., age, a specific genetic score) on the prediction output for that specific meta-feature. A positive SHAP value means the feature pushed the prediction higher (e.g., towards predicting mental illness), while a negative value means it pushed it lower.
  • Component Features: Within each example plot (e.g., '20-year mental illness'), individual features used to predict that specific meta-feature are listed vertically (e.g., 'Recent depression', 'Bipolar-1 PRS', 'Age stopped smoking').
  • Data Point Representation: Each dot on a plot represents one person in the study cohort (n=33,419). The color indicates the original value of the feature for that person (red=high, blue=low), and its position on the horizontal axis shows its SHAP value (impact on the meta-feature prediction). For example, in the '20-year mental illness' plot, high values (red dots) for 'Recent depression' tend to have positive SHAP values, increasing the predicted likelihood of mental illness.
  • Integrative Nature Highlighted: These examples demonstrate that the final CAD risk model incorporates information derived from predictions about seemingly unrelated conditions like mental health and sleep patterns, highlighting the integrative nature of the meta-prediction framework.
Scientific Validity
  • Enhanced Model Interpretability: Using SHAP plots to dissect the contribution of component features to the prediction of specific meta-features (especially non-CAD ones) enhances the interpretability of the complex meta-prediction model. It allows examination of why a certain meta-feature value was predicted.
  • Evidence for Complex Interactions: Demonstrating that predictions of non-cardiovascular outcomes (mental illness, sleep duration) derived from genetic and clinical data contribute to the final CAD risk prediction provides evidence for the model capturing complex, potentially pleiotropic relationships or shared risk pathways.
  • Support for Underlying Hypothesis: The inclusion of features like Polygenic Risk Scores (PRSs) for conditions like Bipolar disorder or Insomnia within these meta-feature predictions, and their subsequent influence on CAD risk, supports the 'omnigenic' hypothesis underlying the study design, where many genetic factors contribute indirectly to disease risk.
  • Validation of Meta-Feature Concept: These examples validate the meta-feature concept, showing that intermediate predictions serve as meaningful inputs to the final model, carrying information beyond the raw features alone.
Communication
  • Illustrative examples: Presenting specific examples of non-CAD meta-features (20-year mental illness, any-onset nonischemic CMP, present sleep duration) effectively illustrates the concept that factors beyond traditional cardiovascular risks contribute to the final CAD prediction within this framework.
  • Clarity of individual SHAP plots: The individual SHAP plots for these examples clearly show the distribution of SHAP values for component features (e.g., Bipolar-I PRS, age stopped smoking for mental illness prediction), mirroring the format of the main summary plot and aiding interpretation.
  • Consistency in presentation: The consistent use of color coding (red for high feature value, blue for low) and axis labeling (SHAP value, feature value) across all panels enhances coherence and readability.
  • Complementary information: These specific examples effectively complement the main SHAP summary plot by providing deeper insight into the composition and behavior of selected meta-features, particularly those less intuitively linked to CAD.
  • Clear plot titles: The titles for each example plot clearly state the meta-feature being explained (e.g., "20-year mental illness").
Fig. 4 | Identification of CAD risk subgroups and distinguishing features in...
Full Caption

Fig. 4 | Identification of CAD risk subgroups and distinguishing features in the UKBB population.

Figure/Table Image (Page 8)
Fig. 4 | Identification of CAD risk subgroups and distinguishing features in the UKBB population.
First Reference in Text
Clustering of the incident cohort CAD cases revealed five distinct subgroups (Fig. 4a).
Description
  • Visualization Type: This panel (Fig. 4a) displays a heatmap, which is a graphical representation of data where values are depicted by color. It visualizes the results of grouping individuals who developed Coronary Artery Disease (CAD) during the study.
  • Clustering Method: The grouping was done using a statistical technique called hierarchical clustering. This method builds a hierarchy of clusters, grouping individuals based on similarity.
  • Basis for Clustering (SHAP Values): The similarity between individuals was determined by their SHAP values. SHAP (SHapley Additive exPlanations) values quantify how much each specific input feature (like age, blood pressure, genetic scores) contributed to the individual's predicted risk of CAD from the meta-prediction model. Clustering on SHAP values groups individuals who have similar reasons (feature contributions) for their predicted risk.
  • Heatmap Interpretation: The heatmap colors represent the correlation (statistical relationship) between the SHAP values of different predictive features across the individuals. Blocks of similar color indicate groups of individuals where the contributions of various features to their predicted risk are correlated in a similar way.
  • Identification of Subgroups: The analysis, as indicated by the reference text and the colored bar below the heatmap, identified five distinct subgroups among the CAD cases. These subgroups are labeled 'Lowest', 'Lower', 'Medium', 'Higher', 'Highest', likely corresponding to different underlying risk profiles or magnitudes.
  • Dendrograms: Dendrograms (tree-like diagrams) are shown along the top and left sides, illustrating the hierarchical structure produced by the clustering algorithm.
Scientific Validity
  • Novelty and Validity of Clustering on SHAP Values: Using SHAP values as the input for clustering is an innovative and valid approach. It allows for the identification of subgroups based on the underlying drivers of prediction (feature contributions) rather than just the input features themselves or the final predicted risk value, potentially revealing distinct etiological pathways or risk profiles captured by the model.
  • Appropriateness of Clustering Algorithm: Hierarchical clustering using Ward's linkage and Euclidean distance (as detailed in Methods) is a standard and appropriate technique for exploratory data analysis and identifying group structures.
  • Focus on Incident Cases: Clustering specifically on the incident CAD cases is a reasonable strategy to identify potentially different risk profiles among those who develop the disease, which can then be characterized and used to assign controls.
  • Determination of Cluster Number: Defining the number of clusters (five) via fixed-height tree cutting is a common heuristic but can be somewhat arbitrary. While visually supported by the heatmap structure, exploring the stability of these clusters or using quantitative metrics to determine the optimal number of clusters would enhance rigor.
  • Foundation for Subsequent Analysis: The identification of distinct subgroups based on model interpretation (SHAP values) provides a foundation for subsequent analyses exploring differential responses to interventions (as shown in Fig. 5), linking model interpretability to actionable insights.
Communication
  • Heatmap visualization: The heatmap visualization effectively conveys the presence of distinct blocks or patterns representing the identified subgroups based on SHAP value correlations.
  • Dendrogram clarity: The accompanying dendrograms clearly illustrate the hierarchical relationships derived from the clustering algorithm, showing how individuals and features are grouped.
  • Subgroup labeling: The color bar at the bottom explicitly labels the five distinct subgroups derived from the clustering, linking the visual patterns in the heatmap to the identified groups.
  • Interpretability limitations: While visually informative about structure, the heatmap itself requires subsequent analysis (like Fig. 4b, 4c) to understand the specific characteristics that define each subgroup; its standalone interpretability regarding why clusters differ is limited.
  • Color gradient use: The use of color gradients within the heatmap effectively represents the correlation strength between SHAP values for different predictors across individuals.
Fig. 5 | Benefit of clinical interventions in genetic risk and risk subgroups...
Full Caption

Fig. 5 | Benefit of clinical interventions in genetic risk and risk subgroups in the UKBB population.

Figure/Table Image (Page 9)
Fig. 5 | Benefit of clinical interventions in genetic risk and risk subgroups in the UKBB population.
First Reference in Text
Similarly, the degree of absolute risk reduction achieved in each CAD subgroup by meeting these targets is presented in Fig. 5d-f,g-i, respectively.
Description
  • Purpose: Absolute Risk Reduction: These three graphs (panels d, e, and f) show the estimated absolute decrease in the 10-year risk of developing Coronary Artery Disease (CAD) for different groups of people when specific medical interventions are applied.
  • Population: Risk Subgroups: The groups of people shown are the five distinct CAD risk subgroups (identified in Figure 4 and represented by different colors) found within the UK Biobank population.
  • Interventions Simulated: Each panel focuses on a different intervention: panel d shows the effect of lowering LDL cholesterol ('bad' cholesterol) to various target levels (35, 55, 70, 100 mg/dl); panel e shows lowering HbA1c (a measure of long-term blood sugar control) to different targets (5%, 5.6%, 6%, 6.5%, 7%); panel f shows lowering systolic blood pressure (SBP) to various targets (100 to 160 mmHg).
  • Axes Interpretation: The vertical axis represents the percentage point reduction in absolute risk (e.g., a value of -4 means a 4% absolute reduction in risk). The horizontal axis shows the different target levels for the intervention.
  • Key Observation: Differential Benefit: The lines show that the amount of absolute risk reduction often differs between subgroups. For example, in panel d (LDL lowering), the highest-risk subgroup (red line) shows a much larger absolute risk reduction when targeting very low LDL levels (e.g., ~4% reduction going from LDL 100 to 35) compared to the lowest-risk subgroup (purple line), which shows very little absolute benefit.
  • Data Representation: The points on the graph represent the median risk reduction for each subgroup at each target level, and the error bars indicate the variability (standard error).
Scientific Validity
  • Validity of Simulation Approach: Simulating the effect of interventions by modifying the target analyte value within the trained prediction model and observing the change in predicted risk is a valid approach to estimate potential intervention benefits, leveraging the model's learned relationships.
  • Assessment of Heterogeneity of Benefit: Presenting results stratified by the SHAP-derived risk subgroups allows for the assessment of heterogeneity of treatment effect (as predicted by the model). This directly tests whether the identified subgroups exhibit differential benefits from standard interventions.
  • Clinical Relevance of Absolute Risk: Focusing on absolute risk reduction is clinically highly relevant, as it directly relates to the number needed to treat and the magnitude of benefit for an individual patient.
  • Biological Plausibility: The observed pattern, particularly for LDL lowering (panel d), where the highest-risk subgroup (which also has high genetic risk) derives the largest absolute benefit, aligns with biological plausibility and previous findings regarding statin benefits in high-PRS individuals.
  • Use of Standard Clinical Targets: The use of standard clinical targets for LDL, HbA1c, and SBP makes the simulation results interpretable in the context of current treatment guidelines.
  • Model-Based Prediction Caveat: While insightful, these are model-based predictions of benefit, not results from actual intervention trials within these subgroups. The validity relies on the accuracy and calibration of the underlying meta-prediction model.
Communication
  • Visualization of Absolute Risk Reduction: These panels (d, e, f) effectively visualize how the absolute amount of predicted risk reduction varies across the five previously identified CAD risk subgroups (color-coded) when simulating standard clinical interventions (LDL lowering, HbA1c lowering, SBP lowering).
  • Dose-Response Relationship: Plotting the median risk change with error bars (representing standard error) against different intervention target levels (e.g., target LDL levels of 35, 55, 70, 100 mg/dl) clearly illustrates the dose-response relationship within each subgroup.
  • Consistent Subgroup Coloring: The consistent color-coding of subgroups, linked back to Fig. 4, allows for easy comparison of intervention benefits across the different risk profiles.
  • Clarity via Separate Panels: Separating the interventions into three distinct panels (d for LDL, e for HbA1c, f for SBP) maintains clarity and avoids overcrowding.
  • Axis Labeling: The y-axis clearly labels 'Absolute risk change (%)', making the interpretation of the benefit straightforward.
Extended Data Fig. 1 | Feature importance and SHAP summary for 10-year...
Full Caption

Extended Data Fig. 1 | Feature importance and SHAP summary for 10-year prospective CAD risk prediction in the UK Biobank.

Figure/Table Image (Page 17)
Extended Data Fig. 1 | Feature importance and SHAP summary for 10-year prospective CAD risk prediction in the UK Biobank.
First Reference in Text
The final 50 prioritized features included 13 directly measured features, 22 PRSs and 15 meta-features (Extended Data Fig. 1).
Description
  • Purpose: Feature Importance: This figure identifies the 50 most important factors ('features') used by the final machine learning model to predict the 10-year risk of developing Coronary Artery Disease (CAD) in the UK Biobank population.
  • Bar Plot (Overall Importance): It uses two types of visualizations. The left panel is a bar chart that ranks the features from most to least important based on their average impact on the model's prediction across all individuals. Importance is measured by the 'mean absolute SHAP value' - a higher bar means the feature generally had a larger impact (positive or negative) on the risk prediction.
  • SHAP Summary Plot (Detailed Impact): The right panel is a SHAP (SHapley Additive exPlanations) summary plot. This provides more detail than the bar chart. Each feature is listed vertically. For each feature, the horizontal spread of dots shows how much that feature influenced the prediction for each individual person in the test group (n=33,419). A dot's position indicates the SHAP value: positive values mean the feature increased the predicted risk, negative values mean it decreased the risk. The color of the dot indicates the feature's value for that person (red for high values, blue for low values).
  • Types of Features: The features shown include different types: directly measured clinical data (like cholesterol HDL ratio, systolic blood pressure - labeled 'biomarkers'), genetic risk scores (Polygenic Risk Scores or 'PRS' for CAD and other conditions), and 'meta-features' (which are themselves predictions from intermediate models, e.g., 'baseline Dx of anyonset revascularization (UFs)' - prediction of having had a revascularization procedure before the study start, based only on unmodifiable factors).
  • Top Ranked Features: The figure highlights that the most important features are a mix of these types, with meta-features related to prior cardiovascular events or procedures, key biomarkers, and specific CAD PRSs appearing high on the list.
Scientific Validity
  • Appropriateness of SHAP Method: SHAP values are a well-established and theoretically grounded method for explaining the output of complex machine learning models, particularly tree-based ensembles like XGBoost (used in this study). Using mean absolute SHAP values for ranking overall feature importance is standard practice.
  • Model Transparency: The identification of the top 50 features provides transparency into the final prediction model, showing which factors drive the risk estimates.
  • Support for Integrative Approach: The inclusion of diverse feature types (measured clinical data, multiple PRSs for different traits, derived meta-features) among the top 50 supports the study's integrative approach and the hypothesis that combining different data modalities improves prediction.
  • Clinical and Biological Plausibility: The specific features identified as most important (e.g., meta-features for revascularization/MI, HDL ratio, SBP, specific CAD PRSs) align well with established clinical and genetic knowledge of CAD risk factors, lending face validity to the model.
  • Directionality of Effects: The SHAP summary plot provides valuable insight into not just the magnitude but also the direction of feature effects (e.g., showing that higher HDL ratio generally lowers predicted risk, as expected).
Communication
  • Dual Visualization Approach: The figure effectively combines two complementary views: a bar plot showing the overall importance (mean absolute SHAP value) of each feature, ranking them clearly, and a detailed SHAP summary plot showing the distribution and direction of impact for each feature across all individuals.
  • SHAP Plot Clarity: The SHAP summary plot uses standard conventions (color for feature value, position for SHAP value) which are generally well-understood in the machine learning field, facilitating interpretation of feature effects (e.g., high values of a feature pushing risk prediction higher or lower).
  • Clear Feature Labeling: Features are clearly labeled on the y-axis, including specific identifiers for PRSs (Polygenic Risk Scores) and indicating whether meta-features are based on unmodifiable (UFs) or modifiable/unmodifiable factors (MUFs).
  • Feature Type Legend: A legend clearly distinguishes between different feature types (meta-feature, biomarkers, PRS) using color coding in the bar plot, adding another layer of information.
  • Appropriate Level of Detail: Presenting the top 50 features provides substantial detail about the model's composition without being overwhelming.
Extended Data Fig. 2 | Evaluating the calibration and predictive value of...
Full Caption

Extended Data Fig. 2 | Evaluating the calibration and predictive value of feature categories for the meta-prediction model in the UK Biobank.

Figure/Table Image (Page 18)
Extended Data Fig. 2 | Evaluating the calibration and predictive value of feature categories for the meta-prediction model in the UK Biobank.
First Reference in Text
The final meta-prediction model was also found to be the most calibrated among the existing clinical risk scores, all recalibrated with the same cohort (Extended Data Fig. 2a).
Description
  • Purpose: Model Calibration Check: This graph (Panel a) is a calibration plot, designed to check how well the risk percentages predicted by different models match the actual percentage of people who experienced the event (developing Coronary Artery Disease, CAD) within the UK Biobank test cohort (n = 33,419).
  • Axes Definition: The horizontal axis shows the average predicted risk (from 0 to 1, or 0% to 100%) within different groups (usually deciles, or tenths, of predicted risk). The vertical axis shows the actual fraction (percentage) of people within that group who developed CAD.
  • Ideal Calibration: Ideally, if a model predicts a 10% risk for a group, about 10% of people in that group should actually develop CAD. In a perfectly calibrated model, the plotted points would fall exactly on the diagonal dashed line (labeled 'Perfectly calibrated').
  • Models Compared: The plot compares the study's 'meta-prediction' model against existing clinical scores (PCE, QRISK3, PREVENT). It also shows versions of these existing scores that were 'recalibrated' using the same study data, meaning they were adjusted to better fit this specific population.
  • Visual Result: Visually, the points for the 'meta-prediction' model (blue circles) appear closest to the diagonal line compared to the other models (both original and recalibrated), suggesting it is the best calibrated.
  • Brier Score Comparison: The legend provides Brier scores for each model. The Brier score is a single number summarizing both calibration and discrimination (accuracy); lower scores are better. The meta-prediction model has the lowest Brier score (0.0678), followed by the recalibrated clinical scores (around 0.077), while the original clinical scores have higher Brier scores (0.0775 to 0.0811), quantitatively supporting the visual impression of better calibration for the meta-prediction model.
Scientific Validity
  • Importance of Calibration Assessment: Calibration is a critical aspect of evaluating prediction models, especially for clinical risk prediction where the absolute predicted risk informs decisions. Assessing calibration via plots and Brier scores is standard and appropriate.
  • Comparison to Standard Scores: Comparing the new model against established clinical scores (PCE, QRISK3, PREVENT) provides necessary context regarding its performance relative to current standards.
  • Validity of Recalibration: Recalibrating the existing clinical scores using the same study cohort before comparison is methodologically sound. It ensures that any observed difference in calibration is not simply due to the comparator models being developed in different populations, thus providing a fairer assessment of the meta-prediction model's inherent calibration.
  • Use of Brier Score: The Brier score provides a proper scoring rule that assesses both calibration and discrimination simultaneously, offering a robust quantitative measure to complement the visual calibration plot.
  • Implication of Superior Calibration: The finding that the meta-prediction model demonstrates superior calibration suggests that its risk estimates are more reliable across the spectrum of predicted probabilities compared to the standard scores, even after recalibration, enhancing its potential clinical utility.
Communication
  • Standard Plot Format: The calibration plot format is standard and effectively communicates the agreement between predicted probabilities and observed event frequencies.
  • Direct Comparison: Plotting multiple models (meta-prediction, PCE, QRISK3, PREVENT, plus recalibrated versions) on the same axes allows for direct visual comparison of their calibration.
  • Reference Line Clarity: The inclusion of the diagonal line representing perfect calibration provides an immediate visual reference for assessing model performance.
  • Legend and Brier Score Inclusion: The legend clearly identifies each model and its corresponding marker/color. Including the Brier scores directly in the legend provides a concise quantitative summary alongside the visual plot.
  • Distinction of Recalibrated Scores: Distinguishing between the original and recalibrated versions of the clinical scores visually reinforces the importance of calibration within the specific study cohort.
Extended Data Fig. 3 | Comparative performance of CAD risk prediction models in...
Full Caption

Extended Data Fig. 3 | Comparative performance of CAD risk prediction models in the UK Biobank.

Figure/Table Image (Page 19)
Extended Data Fig. 3 | Comparative performance of CAD risk prediction models in the UK Biobank.
First Reference in Text
The model produced improved risk stratification across all percentiles of predicted risk across the 10-year follow-up period with an average twofold enrichment of CAD events at 10 years among the top percentile bin (Extended Data Fig. 3).
Description
  • Purpose: Comparative Performance: This figure provides a detailed comparison of the study's 'meta-prediction' model against numerous other existing models for predicting Coronary Artery Disease (CAD) risk over 10 years within the UK Biobank test population (n = 33,419).
  • Models Compared: The models compared include standard clinical scores (PCE, QRISK3, PREVENT) and various research-based models, including several polygenic risk scores (GPS_CAD, metaGRS_CAD, Aragam_2022, etc.) and machine learning models (ML4H_EN-COX, UKBCRP).
  • Left Panel: Incidence vs. Predicted Risk Percentile: For each model, two plots are shown. The left plot is a scatter plot showing the actual incidence rate (percentage) of CAD observed at 10 years for individuals grouped into percentiles based on their predicted risk score. A steeper curve indicates better performance, as higher predicted risk percentiles correspond to much higher actual event rates.
  • Right Panel: Cumulative Risk Curves by Percentile: The right plot shows cumulative risk curves over the 10-year follow-up period. Individuals are grouped into different predicted risk percentiles (e.g., <1%, 1-5%, 5-10%, ..., >99%), and the plot tracks the proportion within each group who develop CAD over time. Better models show greater separation between the curves for different risk percentiles, indicating they can effectively distinguish between low- and high-risk individuals throughout the follow-up period.
  • Key Observation: Superior Stratification: Visually comparing the plots, the 'meta-prediction' model (top row) shows both a steeper curve in the left panel and wider, more distinct separation between the percentile risk trajectories in the right panel compared to all other models presented.
  • Top Percentile Enrichment: The reference text highlights that the meta-prediction model achieves about a twofold enrichment (concentration) of CAD events in the very highest risk group (top percentile bin) compared to average, which is visually supported by the high incidence rate shown at the far right of its scatter plot.
Scientific Validity
  • Validity of Incidence vs. Percentile Plot: Plotting observed incidence against predicted risk percentiles is a standard way to visualize model calibration and the ability to concentrate events in high-risk groups.
  • Validity of Cumulative Risk Curves: Using stratified cumulative incidence curves (similar to Kaplan-Meier curves stratified by risk percentiles) is an appropriate and informative method to assess how well a model separates risk trajectories over the entire follow-up period.
  • Robust Benchmarking: Comparing the novel meta-prediction model against a comprehensive set of established clinical scores and contemporary research models (including various PRS and ML approaches) provides a robust benchmark for evaluating its performance.
  • Strong Evidence for Improved Stratification: The consistent visual superiority of the meta-prediction model across both types of plots provides strong evidence for its improved risk stratification capabilities compared to existing approaches within this cohort.
  • Granular Performance Assessment: Evaluating performance across the full spectrum of risk percentiles, rather than just using summary statistics like AUROC, provides a more granular understanding of model behavior, particularly at the extremes of risk which are often clinically important.
Communication
  • Dual Plot Presentation: The side-by-side presentation of the incidence scatter plot and the cumulative risk curve for each model provides a comprehensive visual assessment of risk stratification capabilities.
  • Consistent Stratification and Legend: Consistently applying the same percentile groupings and color scheme (shown in the legend) across all cumulative risk plots allows for direct comparison of risk separation achieved by different models.
  • Systematic Comparison Layout: The layout, while dense due to the number of models compared, allows for systematic visual comparison of the meta-prediction model against a wide array of established clinical and research scores.
  • Axis Labeling: Axis labels are clear (Predicted risk percentile, Incidence of CAD, Follow-up time (years), Cumulative risk of CAD), facilitating interpretation.
  • Visual Differentiation of Performance: The visual difference in the steepness of the incidence curve (left panels) and the separation between percentile trajectories (right panels) effectively highlights the superior performance of the meta-prediction model.
Extended Data Fig. 4 | SHAP summary plots for meta-features in the final model...
Full Caption

Extended Data Fig. 4 | SHAP summary plots for meta-features in the final model in the UK Biobank.

Figure/Table Image (Page 20)
Extended Data Fig. 4 | SHAP summary plots for meta-features in the final model in the UK Biobank.
First Reference in Text
Baseline diagnosis predictions were 33% early onset and 67% any onset, made using only unmodifiable predictive features. These predictors include several CAD PRSs as well as the family history of heart disease (Extended Data Fig. 4).
Description
  • Overall Purpose: This figure provides a detailed breakdown of the intermediate prediction models used within the overall framework. It displays individual SHAP summary plots for each of the 'meta-features' that were included as inputs into the final 10-year CAD risk prediction model.
  • Meta-Feature Definition: A 'meta-feature' in this context is the output of a prediction model trained to predict a specific outcome (like having a diagnosis at baseline, or developing a condition in the future). These predictions then become inputs for the final, main prediction model.
  • Individual SHAP Plots: Each small plot in the figure corresponds to one meta-feature. It uses SHAP (SHapley Additive exPlanations) values to show which underlying factors (e.g., age, sex, specific genetic scores, lifestyle factors) were most important for predicting that specific meta-feature.
  • Baseline Diagnosis Meta-Features (UFs): The reference text highlights the 'baseline diagnosis predictions'. Looking at the plots labeled 'baseline Dx...' (e.g., 'baseline Dx of earlyonset coronary artery disease (UFs)'), the figure shows these specific meta-features were predicted using only 'unmodifiable factors' (UFs) – factors that cannot be changed, such as age, sex, family history of heart disease, and various Polygenic Risk Scores (PRSs - scores summarizing genetic predisposition). This is done to avoid issues where a modifiable factor measured at baseline might be influenced by a disease that already exists.
  • Contrast with Future Diagnosis (MUFs): The figure contrasts these baseline UFs predictions with 'future diagnosis predictions' (labeled 'future Dx... (MUFs)'), which used both modifiable and unmodifiable factors as inputs, as indicated by the (MUFs) notation.
  • Key Unmodifiable Predictors Shown: For the baseline diagnosis UFs plots specifically mentioned, features like CAD PRS PGS003356, sex, age, and family history are shown to be important contributors, visually confirming the reference text's statement.
Scientific Validity
  • Mitigation of Reverse Causation: Generating meta-features for baseline diagnoses using only unmodifiable factors (age, sex, genetics, family history) is a methodologically sound approach to mitigate reverse causation bias, where the presence of the disease might influence the measurement of modifiable risk factors.
  • Interpretability of Meta-Features: Using SHAP plots to explain the contribution of base features to each meta-feature prediction adds a layer of interpretability to the complex multi-stage modeling framework.
  • Plausibility of Predictors: The features identified as important for predicting baseline CAD diagnoses using only UFs (various CAD PRSs, age, sex, family history) align well with established knowledge of non-modifiable risk factors for CAD, providing face validity for these intermediate models.
  • Transparency of Modeling Steps: This detailed visualization of meta-feature construction allows researchers to scrutinize the intermediate steps of the meta-prediction model, increasing transparency and allowing for assessment of whether these intermediate predictions are plausible.
  • Methodological Rigor (UF vs MUF): The distinction in input features used for baseline (UFs only) versus future (MUFs) predictions demonstrates careful consideration of potential biases in the modeling strategy.
Communication
  • Granularity and Transparency: Presenting individual SHAP plots for each meta-feature provides a high level of granularity and transparency into the construction of these intermediate predictions.
  • Consistent Plot Format: The consistent format across all subplots (SHAP summary plot) allows for systematic comparison of feature contributions across different meta-feature predictions.
  • Clear Labeling and Notation: Clear titles for each subplot identify the specific meta-feature being explained (e.g., 'baseline Dx of earlyonset coronary artery disease (UFs)'). The inclusion of '(UFs)' or '(MUFs)' notation clearly distinguishes the type of input features used.
  • Support for Reference Text: The figure effectively supports the reference text by visually demonstrating that baseline diagnosis meta-features (those labeled 'baseline Dx... (UFs)') are indeed predicted using unmodifiable features like age, sex, family history, and various PRSs.
  • Information Density: Due to the large number of meta-features, the figure is necessarily dense. While comprehensive, it may require significant effort from the reader to examine specific plots of interest.
Extended Data Fig. 5 | External validation of the streamlined meta-prediction.
Figure/Table Image (Page 22)
Extended Data Fig. 5 | External validation of the streamlined meta-prediction.
First Reference in Text
The streamlined model trained on UKBB achieved an AUROC of 0.81, with no loss of accuracy when tested in the full AoU cohort (AUROC 0.81) (Extended Data Fig. 5a).
Description
  • Plot Type: ROC Curve: This graph (Panel a) displays Receiver Operating Characteristic (ROC) curves, which are used to evaluate the performance of a prediction model, specifically its ability to distinguish between individuals who will experience an event (like developing Coronary Artery Disease, CAD) and those who will not.
  • Comparison Groups: The plot compares the performance of a 'streamlined' version of the meta-prediction model (trained on UK Biobank data, UKBB) when tested on several different groups: the original UKBB test set, the entire external validation cohort from the All of Us (AoU) research program, and specific self-reported ancestry groups within AoU (European - EUR, African - AFR, Hispanic - HIS).
  • ROC Curve Axes: Each curve plots the true positive rate (sensitivity, or the proportion of actual CAD cases correctly identified as high risk) against the false positive rate (1-specificity, or the proportion of non-cases incorrectly identified as high risk) at various prediction thresholds.
  • AUROC Interpretation: A curve closer to the top-left corner indicates better performance. The overall performance is summarized by the Area Under the ROC Curve (AUROC). An AUROC of 1.0 represents a perfect model, while 0.5 represents a model no better than random chance.
  • Key AUROC Results (UKBB vs AoU): The legend shows the AUROC values. The model achieved an AUROC of 0.81 [95% Confidence Interval 0.80-0.82] in the UKBB test set. When tested on the full AoU cohort, it achieved an identical AUROC of 0.81 [0.80-0.82], as stated in the reference text.
  • AUROC Across Ancestries: Performance varied slightly across AoU ancestry groups: AUROC was 0.81 [0.80-0.82] for European ancestry, 0.79 [0.77-0.81] for African ancestry, and 0.84 [0.82-0.86] for Hispanic ancestry.
Scientific Validity
  • Importance of External Validation: External validation using an independent cohort (AoU) with different demographics, recruitment strategies, and potentially data structures compared to the training cohort (UKBB) is essential for assessing the generalizability and robustness of a prediction model.
  • Choice of Validation Cohort (AoU): The All of Us (AoU) research program, known for its diverse population, is an appropriate and valuable cohort for testing the model's performance across different ancestries.
  • Appropriateness of ROC/AUROC: Using ROC curves and AUROC is a standard and valid method for evaluating the discriminative performance of prediction models.
  • Evidence for Generalizability: The finding that the AUROC remained stable (0.81) when moving from the UKBB test set to the full AoU cohort strongly supports the model's generalizability and robustness.
  • Assessment Across Ancestries: Stratifying the validation by self-reported ancestry groups provides crucial information about potential performance disparities. While the performance is broadly similar (AUROC 0.79-0.84), the observed variations highlight the ongoing need for diverse training data and ancestry-specific model evaluation.
  • Use of Streamlined Model: The use of a 'streamlined' model for validation (likely involving feature mapping between cohorts) is a practical approach, though any differences between the streamlined and full models should be considered.
Communication
  • Standard Visualization: The ROC curve is a standard and effective visualization for comparing the discriminative ability of diagnostic or prediction models.
  • Direct Comparison Across Cohorts: Plotting multiple ROC curves on the same axes allows for direct visual comparison of the model's performance across different validation cohorts (UKBB, AoU overall, AoU ancestry groups).
  • Clear Legend with AUROC Values: The legend clearly identifies each curve and provides the corresponding Area Under the Curve (AUROC) value with its 95% confidence interval, summarizing the performance quantitatively.
  • Visual Distinction: Using different colors/line styles for each cohort/group enhances distinguishability.
  • Confidence Interval Display: The inclusion of shaded areas representing the 95% confidence intervals for the AUROC values provides important information about the uncertainty of the performance estimates.
  • Support for Reference Text: The plot directly supports the key finding mentioned in the reference text – that the AUROC in the full AoU cohort (0.81) is identical to the AUROC in the UKBB test set (0.81), demonstrating robustness.

Discussion

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Extended Data Fig. 6 | SHAP explanation of streamlined meta-prediction in UK...
Full Caption

Extended Data Fig. 6 | SHAP explanation of streamlined meta-prediction in UK Biobank and All of Us research program.

Figure/Table Image (Page 23)
Extended Data Fig. 6 | SHAP explanation of streamlined meta-prediction in UK Biobank and All of Us research program.
First Reference in Text
AoU cohorts at the individual level as demonstrated by the resultant SHAP plots (Extended Data Fig. 6).
Description
  • Purpose: Compare Model Explanations: This figure presents a comparison of the explanations for the 'streamlined' Coronary Artery Disease (CAD) risk prediction model, showing which factors are important in two different large population studies: the UK Biobank (UKBB) and the All of Us (AoU) research program.
  • Method: SHAP Summary Plots: It uses SHAP (SHapley Additive exPlanations) summary plots, displayed side-by-side. Each plot shows the top 50 features contributing to the model's predictions within that specific cohort (UKBB on the left, AoU on the right).
  • Plot Structure: Features are listed vertically, ranked by their overall importance (average impact) within each cohort. The horizontal axis represents the SHAP value, indicating the impact of a feature on the predicted risk for an individual (positive values increase risk, negative values decrease risk).
  • Data Representation: Each dot represents an individual participant. The color of the dot shows the value of the feature for that person (red=high, blue=low). The spread and color patterns help understand how feature values relate to risk prediction impact.
  • Key Observation: Similarity Across Cohorts: By comparing the left (UKBB) and right (AoU) panels, the figure demonstrates that the relative importance rankings and the general patterns of how features influence predictions (e.g., high values of certain features increasing risk) are largely similar across both cohorts, despite the differences between the populations.
  • Consistent Important Features: Features like specific CAD Polygenic Risk Scores (PRS), meta-features related to cardiovascular history (e.g., 'future Dx of 20-year peripheral artery disease'), and biomarkers (e.g., 'cholesterol HDL ratio') appear important in both cohorts.
Scientific Validity
  • Assessing Explanatory Consistency: Comparing SHAP explanations between the training/internal validation cohort (UKBB) and an external validation cohort (AoU) is a valuable step beyond simply comparing performance metrics like AUROC. It assesses whether the model is relying on similar underlying relationships and feature contributions in different populations, adding another layer to the assessment of generalizability.
  • Appropriateness of SHAP for Comparison: The use of SHAP values provides a rigorous, theoretically grounded method for attributing prediction impact to individual features, suitable for comparing model behavior across datasets.
  • Evidence for Robustness/Generalizability: Demonstrating that the feature importance hierarchy and the direction/distribution of SHAP values are broadly consistent between UKBB and the diverse AoU cohort strengthens the evidence for the model's robustness and suggests it is not overly tuned to specific characteristics of the UKBB population.
  • Increased Confidence in Model Mechanisms: Observing consistency in SHAP explanations increases confidence that the model is capturing genuine biological and clinical signals relevant to CAD risk, rather than spurious correlations specific to one dataset.
  • Potential for Identifying Subtle Differences: While overall patterns are similar, subtle differences in rankings or distributions might exist and could warrant further investigation, potentially reflecting true population differences or data variations, although the figure primarily emphasizes the similarities.
Communication
  • Side-by-Side Comparison: The side-by-side presentation of SHAP summary plots for the UK Biobank (UKBB) and the All of Us (AoU) cohorts allows for direct visual comparison of feature importance rankings and effect distributions between the training/internal validation and external validation datasets.
  • Consistent Format: Using the same set of features and the same visualization format (SHAP summary plot) for both cohorts facilitates a clear assessment of consistency in feature contributions.
  • Clear Labeling: The labeling of cohorts ('UK Biobank', 'All of Us') and the consistent feature labels on the y-axis are clear.
  • Standard Conventions: The standard SHAP plot conventions (color for feature value, x-axis for SHAP value) are maintained, aiding interpretability for those familiar with the method.
  • Visual Demonstration of Consistency: The figure effectively illustrates the general similarity in the patterns of feature importance and effect direction across the two cohorts, supporting the claim of model robustness and consistent behavior.
Extended Data Fig. 7 | Overview of generalizable genetic meta-prediction model...
Full Caption

Extended Data Fig. 7 | Overview of generalizable genetic meta-prediction model in the UK Biobank.

Figure/Table Image (Page 24)
Extended Data Fig. 7 | Overview of generalizable genetic meta-prediction model in the UK Biobank.
First Reference in Text
The generalizable genetic model continued to be well calibrated and to show superior accuracy compared with existing clinical risk tools and non-UKBB-derived PRS models, achieving 47% cumulative risk for CAD at the highest percentile of risk at 10 years of follow-up (Extended Data Fig. 7a-c), as well as achieving an area under the curve (AUC) of 0.80
Description
  • Purpose: Calibration Assessment: This graph (Panel a) is a calibration plot assessing how well the predicted risks from the 'generalizable genetic meta-prediction' model align with the actual observed frequency of Coronary Artery Disease (CAD) events in the UK Biobank test cohort (n=33,419).
  • Models Compared: It compares the calibration of this genetic model against standard clinical risk scores (PCE, QRISK3, PREVENT) that have been recalibrated for the UKBB cohort, and a standard polygenic score (GPS_CAD).
  • Axes Definition: The horizontal axis represents the average predicted 10-year risk within deciles (tenths) of the population, and the vertical axis represents the actual observed proportion of individuals experiencing CAD within those deciles.
  • Interpretation of Calibration: Points falling close to the dashed diagonal line indicate good calibration (predicted risk matches observed risk). The plot shows the points for the generalizable genetic model (red squares) are relatively close to the diagonal.
  • Brier Score Result: The Brier score, a measure combining calibration and discrimination (lower is better), is reported as 0.0735 for the generalizable genetic model, which is better (lower) than the scores for the recalibrated clinical models (0.077-0.078) and GPS_CAD (0.078), supporting the claim of good calibration.
Scientific Validity
  • Importance of Calibration: Assessing calibration is crucial for any risk prediction model, especially one intended for broad application. Using calibration plots and Brier scores provides a standard and rigorous evaluation.
  • Fair Comparison: Comparing against recalibrated clinical scores ensures a fair assessment relative to standard tools adjusted for the study population.
  • Assessment of Generalizability: The development and validation of a 'generalizable genetic model' (excluding UKBB-derived PRSs) addresses potential overfitting concerns and evaluates the portability of the genetic component of the risk prediction.
  • Reliability of Risk Estimates: The demonstrated good calibration (Brier score 0.0735, visual plot) supports the reliability of the risk estimates produced by this genetic-focused model.
Communication
  • Standard Plot Format: The calibration plot format is standard and effectively displays the agreement between the predicted risks of the generalizable genetic model and the observed event rates.
  • Clear Comparison Context: Comparing the genetic model against recalibrated standard clinical scores (PCE, QRISK3, PREVENT) and a PRS-only score (GPS_CAD) provides clear context for its calibration performance.
  • Clarity of Reference and Legend: The diagonal line for perfect calibration and the legend identifying models and Brier scores are clearly presented.
  • Visual and Quantitative Support: The plot visually supports the text's claim that the model is well-calibrated, with points generally close to the diagonal, and quantitatively supported by the Brier score (0.0735) being lower than the recalibrated comparator scores.
Extended Data Fig. 8 | Feature importance and SHAP summary for 10-year...
Full Caption

Extended Data Fig. 8 | Feature importance and SHAP summary for 10-year prospective CAD risk prediction of generalizable genetic model in the UK Biobank.

Figure/Table Image (Page 25)
Extended Data Fig. 8 | Feature importance and SHAP summary for 10-year prospective CAD risk prediction of generalizable genetic model in the UK Biobank.
First Reference in Text
Feature importance and counts for each individual feature are presented in Extended Data Fig. 8 and Supplementary Table 15.
Description
  • Purpose: Explain Generalizable Genetic Model: This figure explains which factors ('features') are most important for the 'generalizable genetic meta-prediction' model's ability to predict 10-year Coronary Artery Disease (CAD) risk. This specific model was designed to be more generalizable by excluding any genetic risk scores (PRSs) that were developed using the UK Biobank (UKBB) data itself.
  • Methodology: SHAP and Bar Plot: Similar to Extended Data Fig. 1, it uses two plots based on SHAP (SHapley Additive exPlanations) values, calculated on the UKBB test cohort (n=33,419). The left panel is a bar chart ranking the top 50 features by their average impact (mean absolute SHAP value).
  • SHAP Summary Plot Details: The right panel is a detailed SHAP summary plot. Each feature is listed vertically. Dots represent individual people; their horizontal position shows the feature's impact (SHAP value) on their predicted risk (positive=higher risk, negative=lower risk), and the color shows the feature's value (red=high, blue=low).
  • Feature Types Included: The features shown include directly measured biomarkers (e.g., cholesterol HDL ratio, systolic blood pressure), meta-features (predictions from intermediate models, like 'baseline Dx of anyonset coronary artery disease (UFs)'), and non-UKBB derived PRSs (e.g., PGS000667 for lipoprotein(a)).
  • Key Findings: The figure shows that even without UKBB-derived PRSs, meta-features representing baseline diagnoses (using only unmodifiable factors), key biomarkers like HDL ratio and SBP, and certain non-UKBB PRSs remain highly important predictors in this generalizable model.
Scientific Validity
  • Importance of Explaining the Generalizable Model: Presenting SHAP-based feature importance for the generalizable genetic model is crucial for understanding what drives its predictions, especially since it was constructed differently (excluding UKBB-derived PRSs) from the primary model.
  • Assessing Plausibility for Generalizability: This analysis helps assess whether the model relies on plausible factors even after removing potentially cohort-specific genetic signals, supporting its claim of generalizability.
  • Robustness of Non-PRS Components: The continued high importance of meta-features (especially baseline diagnoses derived from UFs) and established biomarkers suggests these components are robust predictors independent of the specific PRSs used.
  • Identification of Generalizable Genetic Factors: Identifying which non-UKBB PRSs contribute significantly (e.g., Lp(a) PRS) provides insight into universally relevant genetic risk factors captured by the model.
  • Comparison Basis (Implicit): Comparing this feature importance profile (implicitly) to that of the main model (Extended Data Fig. 1) allows an assessment of how excluding UKBB-derived PRSs altered the model's reliance on other features.
Communication
  • Consistent Visualization Format: The figure effectively uses the same dual visualization format (bar plot for overall importance, SHAP summary plot for detailed impact) as Extended Data Fig. 1, providing consistency in presentation.
  • Complementary Views: The side-by-side bar plot and SHAP summary clearly convey both the ranking of features and the nature of their contribution (magnitude, direction, interaction with feature value) for this specific generalizable genetic model.
  • Clear Labeling: Feature labels are clear, including notations for meta-features (UFs/MUFs) and specific PRS identifiers (though these PRSs are explicitly non-UKBB derived for this model).
  • Helpful Legend: The legend distinguishing feature types (meta-feature, biomarker, PRS) aids interpretation of the bar plot.
  • Effective Communication of Model Drivers: This figure successfully communicates the key drivers of the prediction model specifically designed to exclude potentially overfitting UKBB-derived PRSs, highlighting the importance of other genetic signals and clinical/meta-features.

Methods

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Extended Data Fig. 9 | Feature distribution pre- and post-imputation in the UK...
Full Caption

Extended Data Fig. 9 | Feature distribution pre- and post-imputation in the UK Biobank.

Figure/Table Image (Page 26)
Extended Data Fig. 9 | Feature distribution pre- and post-imputation in the UK Biobank.
First Reference in Text
Not explicitly referenced in main text
Description
  • Purpose: Show Effect of Imputation: This figure shows how the statistical distributions of various participant characteristics ('features') looked before and after a data processing step called imputation was performed.
  • Imputation Explained: Imputation is a technique used to fill in missing data points. When collecting large amounts of health data, some information might be missing for certain individuals. Machine learning models often require complete data, so imputation methods estimate what the missing values might have been based on other available information for that person and patterns in the overall dataset.
  • Numeric Feature Visualization: The figure displays several plots, separated for males and females. For numeric features (like Body Mass Index, Blood Pressure, Cholesterol levels), it uses density plots, which show the shape of the data distribution (like a smoothed histogram). The distribution before imputation is shown with a bold edge, and the distribution after imputation is shown with a lighter edge, allowing comparison.
  • Categorical Feature Visualization: For categorical features (like smoking status, alcohol status, medication use), it uses stacked bar charts. Two sets of bars are shown side-by-side for each category: one representing the proportions before imputation and one representing the proportions after imputation.
  • Key Observation: Distribution Preservation: The key observation across the plots is that the distributions after imputation (light edges or right-side bars) look very similar to the distributions before imputation (bold edges or left-side bars). This indicates that the imputation process did not significantly distort the original patterns in the data for these features.
Scientific Validity
  • Necessity and Appropriateness of Imputation: Imputation is a necessary step when dealing with missing data in large datasets like UK Biobank to enable the use of many predictive modeling techniques. The chosen method, Multiple Imputation by Chained Equations (MICE) using random forests (via miceRanger package, as stated in Methods), is a sophisticated and appropriate approach for handling missing data in complex datasets with various variable types.
  • Importance of Post-Imputation Checks: Visualizing the distributions of key features before and after imputation is a crucial quality control step. It verifies that the imputation method has not introduced significant bias or artifacts that could distort the data structure and potentially affect downstream modeling results.
  • Evidence of Successful Imputation: The observed preservation of distributions across different feature types (numeric, categorical) and stratified by sex suggests that the imputation was performed successfully and reliably filled in missing values without substantially altering the underlying data characteristics.
  • Visual vs. Quantitative Checks: While visual inspection is helpful, quantitative comparisons (e.g., comparing means, variances, or using statistical tests like Kolmogorov-Smirnov, although potentially overpowered in large datasets) could provide additional formal evidence of distribution preservation, though often visual checks are deemed sufficient for this purpose.
  • Methodological Rigor: This step enhances the overall methodological rigor of the study by demonstrating careful handling of missing data, a common challenge in large biobank analyses.
Communication
  • Comparative Visualization: The side-by-side or overlaid presentation format (density plots for numeric, stacked bars for categorical) effectively allows for direct visual comparison of distributions before and after imputation.
  • Clear Distinction: Distinguishing pre- and post-imputation distributions using different line styles (bold vs. light edges for density plots) or adjacent bars (for categorical) is clear.
  • Stratification by Sex: Stratifying the plots by sex (male/female) provides additional granularity and allows assessment of whether imputation affected distributions differently in men and women.
  • Representative Feature Selection: The figure covers a representative range of feature types (anthropometric, blood pressure, biomarkers, lifestyle factors), illustrating the imputation process across different data types.
  • Labeling Clarity: Labels for features and categories are clear and legible.
  • Placement and Transparency: While not explicitly referenced in the main results/discussion, its placement in support of the Methods section provides valuable transparency regarding data preprocessing.
Extended Data Table 1 | Baseline characteristics of the UK Biobank participants...
Full Caption

Extended Data Table 1 | Baseline characteristics of the UK Biobank participants in the study (n=339,667)

Figure/Table Image (Page 27)
Extended Data Table 1 | Baseline characteristics of the UK Biobank participants in the study (n=339,667)
First Reference in Text
Not explicitly referenced in main text
Description
  • Purpose: Baseline Characteristics: This table provides a detailed summary of the starting characteristics ('baseline characteristics') of the participants included in the study from the UK Biobank dataset. The total number of participants summarized is 339,667.
  • Cohorts Compared: It separates the participants into two main groups defined in the study: the 'Prevalent CAD cohort' (179,508 people who already had Coronary Artery Disease at the start) and the 'Incident CAD cohort' (160,159 people who did not have CAD at the start but were followed to see if they developed it).
  • Types of Characteristics: For each cohort, the table lists numerous characteristics, including demographics (average age around 57-58 years, ~44-46% male, predominantly White ethnicity), prevalence of various diagnoses at baseline (like Myocardial Infarction, Diabetes), lifestyle factors (smoking, alcohol use), medication use (e.g., cholesterol-lowering drugs), family history of diseases, body measurements (BMI, waist-hip ratio), cardiovascular measurements (blood pressure), levels of various substances in the blood ('Biomarkers' like cholesterol, glucose, HbA1c), and average scores on standard clinical risk prediction tools (PCE, QRISK3, PREVENT).
  • Reporting Format (Continuous): For characteristics measured numerically (like age or biomarker levels), the table shows the average value plus or minus the standard deviation (SD), which indicates the typical spread or variability around the average. For example, the average systolic blood pressure was 138.11 (SD 18.7) mmHg in the prevalent cohort and 138.47 (SD 18.78) mmHg in the incident cohort.
  • Reporting Format (Categorical): For characteristics that fall into categories (like sex or smoking status), the table shows the number of people (count) and the percentage (%) in each category. For example, 9.02% of the prevalent cohort had a CAD diagnosis at baseline (by definition), while 0% of the incident cohort did.
  • Cohort Differences: Comparing the two columns allows observation of differences between the groups at baseline. For instance, the prevalent cohort had higher rates of medication use (e.g., 28.35% on cholesterol-lowering drugs vs. 22.54% in the incident cohort) and slightly different average biomarker levels and risk scores, as expected given the presence of existing CAD.
Scientific Validity
  • Standard Practice (Table 1): Presenting baseline characteristics (often referred to as 'Table 1' in clinical studies) is a standard and essential practice in cohort studies and prediction model development papers. It allows readers to understand the population studied and assess its representativeness and potential sources of bias.
  • Comprehensive Variable Selection: The selection of variables covers a comprehensive range of domains relevant to cardiovascular risk, including demographics, established risk factors, comorbidities, biomarkers, and existing risk scores.
  • Appropriate Cohort Separation: Separately characterizing the prevalent and incident cohorts is methodologically crucial, as these cohorts are used for different purposes in the modeling process (training baseline models vs. training/testing incident models). Comparing their characteristics helps understand potential differences that might influence model development and interpretation.
  • Appropriate Summary Statistics: The use of standard summary statistics (mean ± SD for continuous, count (%) for categorical) is appropriate for describing the central tendency and distribution of baseline variables.
  • Robust Estimates due to Sample Size: The large sample sizes provide robust estimates of the baseline characteristics within these specific UK Biobank subsets.
  • Context for Main Findings: While the table itself doesn't involve hypothesis testing, the detailed characterization provides crucial context for interpreting the main study findings regarding prediction model performance and generalizability.
Communication
  • Clear Structure: The table is well-structured, clearly separating characteristics for the prevalent and incident CAD cohorts into distinct columns.
  • Logical Grouping: Variables are logically grouped into categories (Demographic, Diagnosis, Lifestyle, Medication Use, Family History, Body Composition, Cardiovascular Metric, Biomarker, Clinical Risk Score), enhancing readability.
  • Consistent Units: Units are consistently provided for continuous variables (e.g., years, %, mmHg, g/L, mmol/L), which is essential for interpretation.
  • Comprehensive Statistics: Both counts and percentages are provided for categorical variables, and means with standard deviations are given for continuous variables, offering comprehensive summary statistics.
  • Clear Sample Sizes: The total sample size for each cohort (N=179,508 for prevalent, N=160,159 for incident) is clearly stated in the column headers.
  • Table Length: While comprehensive, the table is quite long. However, its clear organization mitigates potential readability issues for a supplementary table.
↑ Back to Top